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Information complexity is computable 

Mark Braverman* Jon Schneider^ 


Abstract 

The information complexity of a function / is the minimum amount of information 
Alice and Bob need to exchange to compute the function /. In this paper we provide 
an algorithm for approximating the information complexity of an arbitrary function 
/ to within any additive error a > 0, thus resolving an open question as to whether 
information complexity is computable. 

In the process, we give the first explicit upper bound on the rate of convergence of 
the information complexity of / when restricted to 6-bit protocols to the (unrestricted) 
information complexity of /. 


1 Introduction 

In 1948, Shannon introduced the field of information theory as a set of tools for understanding 
the limits of one-way communication ra- One of these tools, the information entropy 
function H(X), measures the amount of information contained in a random source X. 

The analogue of information entropy in communication complexity is information com¬ 
plexity. The information complexity of a function / is the least amount of information Alice 
and Bob need to exchange about their inputs to compute a function /. Just as the informa¬ 
tion entropy of a random source X provides a lower bound on the amount of communication 
required to transmit X , the information complexity of a function / provides a lower bound on 
the communication complexity of / |4]. Moreover, just as Shannon’s source coding theorem 
establishes H(X) as the asymptotic communication-per-message required to send multiple 
independent copies of X , the information complexity of / is the asymptotic communication- 
per-copy required to compute multiple copies of / in parallel on independently distributed 
inputs HIE]. 

These properties make information complexity a valuable tool for proving results in com¬ 
munication complexity. Communication complexity lower bounds themselves have a wide 
variety of applications to other areas of computer science; for example, results in circuit com¬ 
plexity such as Karchmer-Wigderson games and ACC lower bounds rely on communication 
complexity lower bounds P2U2]. In addition, techniques from information complexity have 
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been applied to prove various direct sum results in communication complexity mm, in¬ 
cluding the only known direct sum results for general randomized communication complexity 
[1]. Information complexity has also been applied to prove a tight asymptotic bound on the 
communication complexity of the set disjointness function [5j. 

Despite this, many fundamental properties of information complexity remain unknown 
f5|. It is unknown how the information complexity of a function changes asymptotically as we 
allow the protocol to fail with probability e. It is unknown how the information complexity 
of a function grows if we restrict our attention to protocols of bounded depth. Perhaps 
most surprisingly, it is even unknown if, given the truth table of a function /, whether it is 
possible to even compute (to within some additive factor of e) the information complexity of 
/. (Contrast this with the case of the information entropy H(X), which is easily computed 
given the distribution of X). 

In this paper, we resolve the last of these questions; we prove that the information 
complexity of / is indeed computable. Our main technical result is an explicit bound on 
the convergence rate of r-round information complexity to the unbounded-round information 
complexity. More specifically, we show how to convert an arbitrary protocol tt into a protocol 
7 r' that leaks at most e more information than 7r, but requires at most (iVe -1 )°^ rounds 
(Theorem I3.20p . Equivalently, we show that the r-round information complexity of / is at 
most r~ olyN ) larger than the information complexity of /. By combining this convergence 
results with prior results connection information and communication complexity, we obtain 
an algorithm that computes the information complexity of / to within an additive factor of 
a in time 2 exp (^ JV “ )° c 0 (here N is the size of the truth table of /) . 

1.1 Prior Work 

In [13j [14] , Ma and Ishwar present a method to compute tight bounds on the information 
complexity of functions for protocols restricted to r rounds of computation. By examining 
the limit as r tends to infinity, this method allows them to numerically compute the infor¬ 
mation complexity of several functions (such as the 2-bit AND function). To make these 
computations provably correct, one would need effective (computable) estimates on the rate 
of convergence of r-round information complexity to the true information complexity. Such 
estimates were unknown prior to the present paper. 

Plenty of unsolved problems of this flavor — where the computability of some limiting 
value is unknown despite it being straightforward to compute individual terms of this limit 
— occur in information theoretic contexts. One famous problem is the problem of computing 
the Shannon capacity of a graph, the amortized independence number of the Mil power of a 
graph (this limiting quantity also has an interpretation as the zero-error channel capacity of 
a certain channel defined by this graph). While computing the independence number of any 
given graph is possible (albeit NP-hard), the rate at which this limit converges is unknown. 
Indeed, Alon and Lubetzky have shown that the limiting behavior of this quantity can be 
quite complex; no fixed number of terms of this limit is guaranteed to give a subpolynomial 
approximation to the Shannon capacity [T]. Another example, from the realm of quantum 
information theory, occurs in computing the quantum value of games [S]. Here it is straight¬ 
forward to compute the quantum value of a game when limited to n bits of entanglement, 
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but no explicit bounds are known for how many bits of entanglement are required to achieve 
within e of optimal performance. 

1.2 Outline of Proof 

The main result of our paper is that zero-error information complexity is computable. For¬ 
mally, we prove the following theorem. 

Theorem 1.1. There exists an algorithm which, given a function f : Ax B — )■ {0,1}, initial 
distribution /i e A(Ul x B), and a real number a > 0, returns a value C between IC tl (f) — a 
and IC^(f) + a. This can be performed in time 2 cxp (A“ )° ( } ) ; where N = \A x B\. 

Throughout this paper, we will take the perspective of an outside observer watching in 
as Alice and Bob execute some protocol. This observer starts with some probabilistic belief 
about the inputs of Alice and Bob (initially this is just /i, the distribution of inputs to Alice 
and Bob). As Alice and Bob execute the protocol, they send each other signals — Bernoulli 
random variables that contain information about their inputs — which cause the observer to 
update his belief. The total amount of information leaked by the protocol to the participants 
can then be represented directly in terms of the final belief and initial belief (Lemma 12.lip . 
These notions are defined in more detail in Section [21 

The strategy of the proof is as follows. We start with a general protocol tt for solving /, 
and whose information cost is very close to the information complexity of /. Unfortunately, 
we do not know anything about tt besides the fact that it’s a finite, discrete protocol that 
computes / without error. Note that if we could restrict tt to a finite family of protocols 
(e.g. protocols that sent at most b bits, for an explicit bound b = b(a,N), then we could 
just brute force over all such 7r’s and compute the approximate information complexity of /. 
The proof shows that, indeed, there is always a protocol n' that can be derived from tt, and 
which belongs to such an explicit family. The proof proceeds in several steps. In each step, 
more structure is added to tt (structure that is then exploited by the following steps). The 
difficulty is, of course, ensuring at each step that n can be replaced with a more structured 
protocol tt' while increasing its information cost by only, say, a/ 10. Ultimately, we manage 
to turn tt into a protocol with r back-and-forth rounds, where r is an explicit function of N 
and a. Finally, it is shown that an r-rounds of interaction protocol can be replaced with a 
6-bit protocol where b = b(a,N,r ) = b(a,N) is an explicit function, while only increasing 
its information cost by a controlled amount, completing the proof. 

The actual proof of Theorem II.II is roughly structured into three parts. In the first part, 
we begin by showing that we can ‘discretize’ any protocol t t; that is, we can simulate any 
protocol tt with a protocol tt' that only uses a bounded number of different types of signals, 
but that only reveals a marginal amount of additional information. This takes several steps. 
In Section 13.11 we show that it suffices to only consider initial distributions /i with full 
support. In Section 1X21 we show that we can modify protocols so that they never use signals 
too close to the boundary of A(Ul x B). In Section 1X21 we show that, by dividing up large 
signals into smaller parts, we can modify protocols so that all signals have roughly the same 
‘size’. Finally, in Section l3~4l we show that for protocols with the previous three properties, 
we can apply a ‘rounding scheme’ to each signal in this protocol and end up with a bound 
on the number of different signals in our new protocol. 
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In the second part, we show in Section [3751 that we can transform any suitably discrete 
protocol tt (i.e. one that uses an explicitly bounded number of distinct signals) into a protocol 
that uses few rounds. We achieve this via a bundling scheme; the main idea is that, where 
Alice would ordinarily send Bob one instance of a signal, she instead sends Bob several 
instances of this signal. Then, the next several times Alice would send that signal to Bob, 
Bob simply refers to the next unused copy sent by Alice, thus decreasing the number of 
rounds in the protocol. 

Combining the above steps allows us to prove the following bound on the convergence 
rate of r-round information complexity. 

Theorem \3. 2 (A Let n be a communication protocol with information cost C that successfully 
computes function f over inputs drawn from distribution /i over Ax B. Then there exists a 
protocol tt' with information cost at most C + e that also successfully computes f over inputs 
drawn from p, but that uses at most w(f,e) alternations where 

w(f } e) = (Ne~ 1 ) oiN) (1-1) 

where N — \A x B\. 

Finally, in the third part of the proof in Section [3T6l we demonstrate how to approximate 
the bounded-round information complexity of a function by computing the communication 
complexity of several parallel copies of this function. We accomplish this by combining 
an existing result of Braverman and Rao on the compression of bounded-round protocols 
with a direct sum result for information complexity. Since we can compute (albeit fairly 
inefficiently) the communication complexity of any function by enumerating all possible 
protocols of a certain length, this completes our proof. 

The proof we provide below shows that zero-error internal information complexity is 
computable, but the same method (with a modification to Section 13.61 see Remark 13.•101) 
also shows that zero-error external information complexity is computable. We believe similar 
techniques can be used to show that e-error information complexity is computable, but do 
not include such a proof in this paper. 

1.3 Open Problems 

Naturally, the most immediate open problem arising from our work is understanding whether 
(and how much) the rate of convergence in Theorem 13.201 can be improved: 

Open Problem 1.2. What is the (worst case) rate of convergence of the r-round information 
complexity of / to ICfJf)! In other words, for a given e > 0 and truth table size N — \ AxB\, 
how large does r(N, e) need to be to ensure that the r-round information complexity IC r ^(f) 
satisfies 

ICrM) > ICM - 6 ? 

In this paper we prove that r(N, e) < . On the other hand, [5] shows that 

when / is the two-bit AND (and thus N = 4 is a constant), the tight estimate for r is 
r = ©(e" 1 / 2 ). Therefore, the polynomial dependence on e, even when N is a constant, is 
necessary. On the other hand, we do not have any interesting lower bounds on r in terms 
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of N. In particular, it is not known whether the exponential dependence on N is necessary 
here. 

The second open problem is in a similar vein, asking whether Theorem 11.11 can be im¬ 
proved. 

Open Problem 1.3. What is the computational complexity of computing the (zero-error 
internal) information complexity of a function / within error a given its truth table? By 
how much can the bound of 2 exp (^ 7V “ ^ ( } ) be improved? 

By the analysis in Section [3761 any progress on Problem 1 1.21 will translate into progress on 
Problem 11.31 For comparison, it is not hard to see that the trivial algorithm for computing 
the average-case communication complexity of a function / : [n] x [n] —> {0,1} (so that 
N = n 2 ) within an additive error a runs in time 2 n ' NN/a = 2 exp ^ Na 1 )° (1) ). In other words, 
there is an exponential gap between the trivial communication complexity upper bound and 
the bound we obtain in Theorem 11.11 

2 Preliminaries 

2.1 Information Theory 

We briefly review some standard information theoretic definitions used throughout this pa¬ 
per. For a more detailed introduction, we refer the reader to [TO j. 

Definition 2.1 (Entropy). The entropy of a random variable X is H(X) = Pr|W = 
x] log(l/Pr[X = x]). The conditional entropy H(X\Y) is defined to be E, y ^ y[H(X\Y = y)}. 

Definition 2.2 (Mutual Information). The mutual information between two random vari¬ 
ables A, B , denoted I(A] B) is defined to be the quantity H(A) — H(A\B). The conditional 
mutual information I(A;B\C) is H(A\C ) — H(A\BC). 

Definition 2.3 (Divergence). The informational divergence (also known as Kullback-Leibler 
distance or relative entropy) between two distributions A and B is 

D(A\\B) = Y,Mx)log{A(x)/B(x)) 

X 

Proposition 2.4 (Chain Rule). LetC\, C 2 , D, B be random variables. Then I(C 1 C 2 ', B\D ) = 
I(Ci, B\D) + /(C 2 ; B\C\D). 

We will regularly make use of the following inequality for conditional mutual information. 
Lemma 2.5. Let A, B , C , D be four random variables such that /(R; D\AC) = 0. Then 

I (A; B\C) > I(A;B\CD ) 
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Proof. We apply the chain rule twice: 


I(A-B\CD) = 


< 


I(AD; B\C) — I(D m , B\C) 

/(A; B\C) + I(D; B\AC) - I{D ; B\C ) 
I{A-B\C)-I{D-B\C) 

I(A-B\C) 


□ 


2.2 Protocols and Information Complexity 

In the two-party communication setting, Alice is given an element a from a finite set A , while 
Bob is given an element b from a finite set B , where (a, b ) is drawn from some distribution /i 
over Ax B. Their goal is to compute /(a, b), where / : A x B —> {0,1} is a function known 
to both parties. They would like to accomplish this while revealing as little information as 
possible; either to each other (in the case of information cost) or to an outside observer (in 
the case of external information cost). To do this, they execute a communication protocol , 
which we view as being built out of signals. 

Definition 2.6. A signal a over a set S is an assignment of a probability a s E [0,1] to each 
element s in S. For a given element s of S, we define cr(s) to be the Bernoulli random variable 
that equals 1 with probability a s . The size of a signal a is given by |cr| = max s6 g | -) — a s |. 

Definition 2.7. A communication protocol tt is a finite rooted binary tree, where each non¬ 
leaf node is labeled by either a signal over A (corresponding to Alice’s move) or a signal 
over B (corresponding to Bob’s move), and each edge is labeled either 0 or 1. Alice and Bob 
can execute this protocol by starting at the root and repeatedly performing the following 
procedure; if the signal a at the current node is a signal over A , Alice sends Bob an instance 
of <r(a), and they both move down the corresponding edge; likewise, if the signal is a signal 
over B, Bob performs the analogous procedure. 

Each leaf node is labeled with a value 0 or 1. We say the communication protocol 
successfully computes f with zero error if the value of the leaf node Alice and Bob finish 
the protocol on is always equal to f(a,b ) for all (a, b) E Ax B (in particular, even (a, b) 
where /i(a, b) = 0). The communication cost CC( tt) of protocol tt is equal to the depth of 
the deepest leaf in tt. 

This agrees with the usual definition of a private coins protocol (indeed, any bit Alice 
can ever send in any protocol must be a signal over A, and likewise for Bob). A public coins 
protocol is simply a distribution over private coins protocols. For our purposes, it suffices to 
solely examine private coins protocols, since the information cost of a public coins protocol 
is simply the expected information cost of the corresponding private coins protocols. 

As is standard, we will let A and B be random variables representing Alice’s input 
and Bob’s input respectively, and let II be the random variable representing the protocol’s 
transcript. We can then define the information cost of a protocol and the information 
complexity of a function as follows. 
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Definition 2.8. The information cost of a protocol n is given by 


= I(A;U\B) + I(B-,U\A) 

The external information cost of a protocol n is given by 

/cTM = J (^; n ) 

Definition 2.9. The information complexity of a function / is given by 


ICM = inf 

7 r 

where the infimum is over all protocols n that successfully compute /. Likewise, the external 
information complexity of a function / is given by 


= inf IC^ xt (n) 

7T “ 

where again, the infimum is over all protocols n that successfully compute /. 

Throughout the remainder of this paper, it will be useful to think of signals as operating 
on the space A(Ax B) of probability distributions over Ax B, which we term beliefs. At the 
beginning of a protocol, an outside observer’s belief is simply given by /i, the distribution 
(a, b ) was drawn from. As this observer observes new signals, his belief evolves according to 
Bayes’ rule; for example, if he observes the signal a(a) sent by Alice, his belief changes from 
the prior belief p to the posterior belief 


if o(a) 
belief 


Po(a,b ) = 


(1 - o a )p(a,b) 

Ei ,/ 1 - <ri)p(i,j) 


( 2 . 1 ) 


0 (which occurs with probability Pq = ■(! — cr i)p(i,j)) and to the posterior 


Pi(a,b) = 


&aP(a,b ) 

Ei,j a iP(f j) 


( 2 . 2 ) 


if a (a) = 1 (which occurs with probability P 1 = Ei j a iP(h J)) ■ As shorthand, we will 
say that o shifts belief p to (po,Pi). Note that the probabilities P 0 and P\ are uniquely 
recoverable given p 0 and pi (in particular, treating beliefs as vectors in p, m ust be 

the case that P 0 p 0 + P\P\ = p and that P 0 + Pi = 1). If Pq — P\ = |, we say the signal is 
balanced for the belief p (when it is clear from context, we will omit which belief p the signal 
is balanced for). 

We can write similar equations that describe the change in beliefs upon observing the 
signal a(b) sent by Bob. An important consequence of equations 12.11 and 12.21 is that signals 
commute. That is, sending signal a followed by signal o' results in the same probability 
distribution over beliefs as sending signal o' followed by signal o. 

Given a protocol i r, we can label all nodes of the protocol tree with the belief an observer 
would have at that point in the protocol. We can therefore alternatively express the infor¬ 
mation cost and external information cost of a protocol as a function of the final beliefs at 
the leaves of the protocol. 


7 






Definition 2.10. The information cost at a node v of protocol n with belief p = p v is 
defined to be: 


C(p) =E a ~ p [D(p(b\a)\\n(b\a))]+E b ^ p [D(p(a\b)\\iJ,(a\b))] (2.3) 


The external information cost at a node v of protocol n with belief p is defined to be: 


C ext (p) = D{p{a,b)Ma,b)) (2.4) 

Lemma 2.11. The information cost of a protocol is the expected value of the information 
cost at the leaves of the protocol. The external information cost of a protocol is the expected 
value of the external information cost at the leaves of the protocol. 

Proof. We demonstrate the computation for information complexity; the computation for 
external information complexity is similar. Write P(a,b, n) as shorthand for Pr[(A, P,II) = 
(a, b, 7r)]. Note that 


I(A-,T\\B) 


E p M E P(a, 7r|fe) log 


P(a, 7r| b) 
P(a\b)P(n\b) 


E p M E P(7r|6)P(a|7r, b) log 


a ,it 


Y P(b)P(7r\b) Y p ( a lT b ) lo S 


6,7T 


P(a|7r, b) 
/i(a| b) 

P(a|7r, b) 
p(a\b) 


Y J P(7r)P(^|vr)P(P(a|7 r , b)\\p(a\b)) 

b,7T 

Y p (b\n)D(P(a\n, b)\\p(a\b)) 

7r b 

Y P(n)T&b~ P [D{p(a\b) | \p(a\b))] 


The last equality follows from the fact that, P(a\b,7r) is simply the belief about a given 
b at the leaf given by the transcript 7r, and hence is p(a\b) (likewise, P(6|7 t) equals p{b) at 
that leaf). Combining this with the analogous equation for /(P; IT|^4), we find that ICffn) 
is exactly the expected value of C(p) over the leaves of the protocol, as desired. □ 

Alternatively, we can express the information cost of a protocol in terms of how much 
information each signal in the protocol leaks. 

Definition 2.12. Let a be a signal in protocol n that shifts belief p to {po,p\). Then, the 
information cost C(a,p) of a is defined as 

C(a, p) = P 0 C(p 0 ) + WOpO - C(p) (2.5) 

The external information cost C ext (a,p) is similarly defined as 


C ext (a,p) = P 0 C ext ( P o) + PiC ext ( Pl ) - C ex \p ) 





Lemma 2.13. For each node v in the protocol, let p v be the belief at node v, let q v be 
the probability of reaching node v, and let a v be the signal we send at point v. Then the 
information cost of n is equal to 


IC^n) = y A yC{a v ,p v ) 

t?G7T 

Likewise the external information cost of n is equal to 

ICf(n) = J2<l.C'*‘(a v ,Pv) 

V(E7T 

Proof. Expanding each C(a v ,p v ) out according to equation 12.51 all terms C(p v ) for beliefs 
corresponding to non-terminal nodes v in 7r cancel out, and we are left with 

y q v C{p v ) 
leaf nodes v 

which is exactly the expected information cost at the leaves of it, which by Lemma 12.111 
is equal to IC^fi r), as desired. (A similar computation holds for the external information 
cost). □ 

Remark 2.14. Alternatively, one can show that (if X v is the output of signal a v at node v ) 

C(a,p) = I(X v ;A\B,U pre = v ) + I(X V ; B\A, U pre = v ) (2.6) 

(one of these two terms will equal zero, depending on which party sends signal cr v ). Lemma 
12.131 then follows from an application of the chain rule. 

Throughout the remainder of the paper, we will let N = \A x B\ — |*4.| • \B\. Note that 
N is the size of the truth table of / and is thus (in some sense) the size of the input to the 
problem of computing the information complexity of /. All logarithms are to base 2 unless 
otherwise specihed. 

3 Computability of Information Complexity 

3.1 Restricting to p with full support 

We begin by showing that we need only consider initial beliefs p with full support; that is, 
where p(x,y) > p > 0 for all x G A, y G B. We accomplish this by showing we can perturb 
/i while only slightly changing the value of IC tl (f ). Recall that h : [0,1] —> [0,1] is Shannon’s 
entropy function. 

Theorem 3.1. Let p G A(^4 x B) be a distribution without full support (i.e., p(a,b ) = 0 
for some a G A and b G B). Let ( G A(*4. x B) be the uniform distribution over pairs (a, b ) 
where p(a, b) = 0. Then, for any e G (0,1), if p — (1 — e)p + ef, 

-f— (/C S W - 2 h(e) - e log N) < IC„(f) < P—ICM) 
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Proof. Fix a protocol n that successfully computes /. Let Z be a Bernoulli random variable 
with probability e. Note that we can sample from jx by sampling from p, if Z — 0 and 
sampling from ( if Z = 1. Letting 7^(11; A\B) denote I(U]A\B) when (A,B) is distributed 
according to fi, we have that 


4(n;A| B) 


Iji{U.\A\BZ = Q) 

^ (U B; A\BZ) - e/ A (n ; A\BZ = 1)) 


< j^Ifi{IL-A\BZ) 

< j^WAIB) 

where the last inequality follows from Lemma 12.51 since 7(1 1: Z\AB) = 0. 
with the corresponding calculation for 7^(11; B\ A), we see that 


Combining this 


IC,(n) < N_/C a ( 7r) 

On the other hand, note that A\BZ — 1) < 77(A) < log|A|. 
that 


(3.1) 


From this, we see 


I^A\B) 


(W A\BZ) - e/ A (n ; A\BZ = 1)) 


> ——— (^(n ; A\BZ) — elog |^4|) 

1 — e 

> T f- e (I i ,(a-,^\B)-H(Z)- elog |^|) 

> elog |^|) 

Combining this with the corresponding calculation for J M (II; B\A), we see that 

/c„M>T^ /c *M-^-T ± 7 logAf (3 ' 2) 

By taking the inhmum of both sides of equations 13.11 and 13.21 over all protocols n that 
successfully compute /, we obtain the desired result. 


□ 


As a corollary of Theorem 13.11 to compute 7(7 M (/) to within a, it suffices to choose e in 
the above theorem so that h(e) + elog N) < | (so that (1 — e) _1 /C A (/) is within | of 
IC,(f)), and then compute 7C A (/) to within an additive error of (1 — e)f. 

For the remainder of the proof, we will therefore assume that p has full support, and 
define p = min Q) b /i(a, b). Note that, by the proof of Theorem 13.11 we can always ensure that 


10 






















3.2 Using signals far from the boundary 

We next show that we can restrict our attention to protocols where the belief at each node 
is sufficiently separated from the boundary of A(Al x B). 

Definition 3.2. A signal a is a revealer if there exists an i such that a l — l and <jj = 0 for 
all j±i. 

Definition 3.3. A belief p is y-safe if, for all a G A and b G £>, either p(a, b) > 7 or 
p(a, b) = 0. A protocol n is 7 -safe if the signal at every node in 7 r without a 7 -safe belief is 
a revealer. 

Theorem 3.4. Let n be a communication protocol with information cost C. Then, for all 
7 G (0,1), there exists a y-safe protocol 7 r' that computes the same function as 7 r that has 
information cost at most C + (|Al| + \B\)h(p~ 1 y/y). 

We will make use of the following two lemmas. 

Lemma 3.5. If at some point in a protocol n, p(a, b) < 7 , then either the probability that 
A = a or the probability that B = b must be small: 

min (p A (a),p B (b)) < p^yfy 

Proof. View the belief p as a \A\ by \B\ matrix of real numbers. Note that each time Alice 
sends a signal, she updates this belief by multiplying each row of this matrix by a different 
number; likewise, every time Bob sends a signal, he updates this belief by multiplying each 
column of this matrix by a different number. 

At this point in the protocol, for each a G A, let A a be the product of all the updates to 
row a; likewise, let be the product of all the updates to column b. ft follows that 

p{a,b) = p(a,b) \ a n b 

Likewise, we can write 


Pa (a) = K^lAap))^ 

(3.3) 

Pb(P) = K b ^p(i,b)\i 
i 

(3.4) 

Multiplvine equations 13.31 and 13.41 we obtain 


PA{a)p B {b) = XgK b y^ji(i,b)p(a,j)X i K j 

(3.5) 

Finally, note that (since YlijP(pf) = 1); we have that 


iKjp(i,j) = 1 

(3.6) 
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It follows from equations 13.51 and 13.61 that 


Va(o)vb(P) = 


< 


< 


Xa^b ^ ^ Pifj 6)/i(o, j ) XiKj 


h3 


^ .. ;\V(h b )v( a >j) \ „ 

, /^(Q; b)p(l, J ) , t\ i ■ -\XiKj 

yi n(a,b)n(i,j) 

L iJ 

p~ 2 p(a, b)\ a Kb ^ Mb j)XiKj 


hi 


p~ 2 p(a,b)X a K b 
p~ 2 p(a, b) 

P~ 2 7 


Therefore, min(p y 4 (a),ps( 6 )) < M p ~ 2 7 = p 1 y / 7 , as desired. □ 

Lemma 3.6. Let n be a protocol with information cost C. Let v be a node in this protocol 
with belief p. If, at v, Alice reveals whether a = i (with the rest of the protocol remaining 
unchanged), then this modified protocol has information cost at most C + /i(pMM- (Here h 
is the binary entropy function). 

The analogous statement holds for Bob. 


Proof. Alice can reveal whether a = i by sending the revealer signal a where a t — 1 and 
(Xj = 0 for j 7 ^ i. Since signals commute, Alice can equivalently reveal whether a — i at the 
end of the protocol (assuming she passed through node v) instead of right after v. 

At the end of the protocol (but before Alice reveals whether a = i), there may be 
multiple possible terminal beliefs; label these beliefs p\ through px, with belief pi occurring 
with probability Qi. Since these are the terminal beliefs that are descendants of node v, it 
follows that 


K 

y QkPk =p 

k =1 

In particular, 'YhQkPk,Aii) — Pa( l)- Now, as a consequence of equation 12.31 revealing 
whether a = i while at belief p increases the expected information cost of the node by 


yPk,B{b)h(p k)A \B^\b)) 

b 



(3.7) 


h j 

(3.8) 

\ b / 

h{pk, A (i)) 

(3.9) 


where the inequality follows from Jensen’s inequality, since h(x) is concave. It follows that 
the total expected increase in the information cost of this protocol is at most 
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h{pA{i)) 


K 

T.QkhjpHAi)) - h 

k =1 


K 


^ ^ QkPk, 


,k =1 



where the hrst inequality again follows from Jensen’s inequality. This completes the proof. 

□ 


We can now complete the proof of Theorem 13.41 


Proof of Theorem \3-4\ We will construct tt' from 7 r in the following manner: follow tt until 


you reach a belief p satisfying p(i,j ) < 7 for some choice of i and j. Then, by Lemma [3.51 
either pa(1) or p B (j ) is at most Without loss of generality, assume PAif) < P~ x \fl- 

Then, Alice will reveal whether a = i. We repeat this process until the resulting protocol is 
7 -safe. 

Note that on any complete path through tt' , Alice and Bob perform at most \A\ + \B\ 
reveals (since each reveal eliminates at least one of the \A\ options for a or the \B\ options 
for h). By Lemma [3.61 this means the information cost of tt' is at most (|^4| + \B\)h{p~ l 
larger than the information cost of n, as desired. □ 


3.3 Using signals of bounded size 

We next show that we can restrict our attention to protocols that only use signals of a 
bounded size. Here, by a bounded size, we require both that each individual component of 
the signal is sufficiently small and that the amount the signal shifts the corresponding belief 
is sufficiently large. 

Definition 3.7. A signal cr that shifts p to ( Po,Pi ) has power d at belief p if 

d = max(| \p — po|U> ||p-Pi||oo) 

(When it is clear from context, we will often omit the specific belief p). 

Recall that a signal is balanced if the probability P$ it is 0 is equal to P\ = 1/2. Also 
recall that by Definition 12.61 the size of a signal is its maximum input-wise deviation from 
1/2 given by |er| = max se g || — cr s |. We prove: 

Theorem 3.8. Let n be a j-safe communication protocol with information cost C. Then, 
for every e > 0, there exists a n f-safe communication protocol tt' that computes the same 
function as it with information cost at most C + e, but that only uses (in addition to revealer 
signals) balanced signals of size at most 7 _1 J and power at least S, for some positive 5. 

Remark 3.9. Note that unlike 7 which is (an explicit, easily computable) function of e, we 
do not assert anything about the computability of 6 in Theorem 13.81 We only need to know 
that such a 6 exists. The dependence on 6 will be removed later in the analysis. 

To prove Theorem 13.81 we will make use of two lemmas. The hrst lemma provides a 
connection between the size and power of a ( 7 -safe) signal. 
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Lemma 3.10. If signal o is balanced and has power at most 26 at a j-safe belief p, then 

\a\ < 

1 1 — 7 

Proof. Since o is balanced at p, Pq — P\ — so it follows from equations 12.II and 12.21 that 


Po(a,b) = 2p(a,b)(l-a aib ) (3.10) 

Pi(a,b) = 2 p(a,b)a a:b (3.11) 


(if Alice is sending this signal, then o a ^ = cr a ; similarly, if Bob is sending this signal, then 
&a,b = cr 6 ). From equation 13.111 we see that 

_ Pi(a,b) 

^ ~ 2 p(a, b) 

and in particular, 


1 = |pi(a,&) -p(a,b) 1 

2 2 p(a, b ) 


Since belief p is y-safe, p(a,b ) > 7 , and since a has power at most 25 at p, \pi(a, b) — 
p(a,b) | < 25. It follows that 


&a,b 

and therefore that led < as desired. 

I I — 7 ’ 



□ 


The second lemma allows us to ‘decompose’ a signal into a sequence of smaller subsignals. 


Definition 3.11. Let o be a signal that shifts the belief p to (po? Pi)- A signal o' that shifts 
q to (qo,qi) is a subsignal of er if q lies on the segment connecting p 0 and pi, q 0 lies on the 
segment connecting q and po? an d Qi lies 011 H ie segment connecting q and p\. 


Lemma 3.12. If Alice can send a signal o, then she can also send all subsignals of 0 . The 
analogous statement holds for Bob. 


Proof. First, note that since q is a convex combination of go an d qi, there is some signal 
that shifts q to (g 0 , qf). Recall that Alice can send any signal that satisfies o at b = cr a ,b' for all 
b,b' G B and all a G A. Since o a ,b = 7^77 Yhij a i,fP('hf) (equation 12 .ip . we have that 

Pi(a,b) _ Pi(a,b') 

P(a, b) p(a, b ') 

Since qi and q are linear combinations of p\ and p, it follows that 

qi(a,b) _ qi(a,b') 
q(a,b) q(a,V) 

and therefore that o' a b = o' a b > ■ It follows that Alice can send signal o'. □ 
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We can now proceed to prove Theorem 13.81 


Proof of Theorem 1 3. hi Let p m in be the minimum power of a signal in n. We will choose 5 to 
equal min ( 2 ^, ^,). 

Let a be an arbitrary signal in n that shifts the belief p to ( PoiPi )• We will replace cr 
with the following ‘subprotocol’. Intuitively, the following subprotocol uses several small 
signals of power roughly 5 to perform a random walk on the segment between beliefs p 0 and 
P\, terminating when it hits one of the two boundary beliefs. Since this protocol ensures 
that the belief p evolves to either belief po or belief p\ (with the corresponding uniquely 
determined probabilities), it accomplishes the same effect on the distribution of beliefs as 
sending signal a. 

More specifically, we can describe the subprotocol as follows. Assume our current belief 
q lies on the segment between p 0 and p\. Compute d, the distance from q to the nearest 
endpoint (that is, d = min (||g — p 0 ||oo, 1 \q — Pi\ |oo))- If d < 25, send a balanced subsignal 
of a of power d; this either sends q to the nearest endpoint, or increases d to 2 d (since 
|\pi — po||oo > Pmin > 106). On the other hand, if d > 25, simply send a balanced subsignal 
of a of power d; this decreases d by at most 5. 

It is straightforward to see that in the above subprotocol, we only ever send balanced 
signals of power between 5 and 25 (in particular, we never get closer than 5 to an endpoint 
until we reach it). Since i r is y-safe, it follows from Lemma [3.101 that each signal we use in 
this subprotocol also has size at most y~ : d. 

Since the L ^ distance between p 0 and p\ is at most 1, this random walk will terminate 
with probability 1 in finite time. Unfortunately, our resulting protocol is no longer finite. We 
can remedy this by adjusting our subprotocol so that after some large number T of steps, the 
two parties abort the protocol and simply exchange both of their inputs. This ensures the 
two parties can successfully compute the function / but potentially increases the information 
cost of the protocol. Since the information cost at any node of the protocol is bounded above 
(by log \A\ T log |jB|) and since the probability we have to abort our subprotocol decreases in 
T, by choosing a sufficiently large value of T we can ensure, for any e > 0, that this modified 
protocol has information cost at most C + e. □ 

3.4 Using a bounded number of signals 

We now show that we can convert any protocol into a protocol that only uses a bounded 
number of distinct signals, while only increasing the information leaked by a small additive 
factor. 

Theorem 3.13. Let n be a j-safe communication protocol with information cost C that only 
uses (in addition to revealer signals) balanced signals of size at most y _1 d and power at least 
5. Then, for any e > 0, there exists a communication protocol n' that computes the same 
function as i r with information cost at most C + e but that only uses Q different signals, 
where 


Q 


648 

ey 3 In 2 


N/2 

+ 04 + 1*1) 
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To show this, we first argue that signals that are close component-wise have similar effects 
when they act on beliefs. 

Lemma 3.14. Let o and o' be two signals over a set of size N such that o is balanced at 
belief p, and for each i, |cr, — o'\ < e (for some e < 1/6,). Let a shift belief p to (po,Pi) and 
o' shift belief p to (p' 0 ,Pi). Then, as elements ofWL N , ||po — P0II2 < 9e and ||pi — p'i\\ 2 < 9e. 
Proof. Recall that 


Vi(f) 

PiU) 


<7iP(i) 
Ei <?&($ 
<&{*) 

Ei <p(}) 


Moreover, note that since o is balanced at p, ’Y/, i OiP{i) = §• Now, since | o t — o\ < e for all 
i, we have that 


It follows that 



- 



Pi{i) ~ P'i{i) < 


ViP{i) _ Qi - e)p(i) 

1/2 1/2+ e 

Oip(i) (1/2 + e) - (1/2) (oj - e)p(i) 
(1/2) (1/2 + e) 

QiP(i)e+ (1/2 )p(j)e 
(1/2) (1/2 + e) 


Pi 


ep{i) 


(1/2) (1/2 + e) 1/2+ e, 

< (4 + 2 )ep{i) 

= 6 ep[i) 

(where this last inequality follows from the fact that Oi is less than 1). Likewise, 


p[(i) 


f.s . {(Ti + e)p{i) Oip(i) 

PlW < - w 

(l/2)(ji + c)p(i) - (1/2 - e)gjp(a) 
(l/2)(l/2-t) 
g,p(j)e + (l/2)p(Qe 
(l/2)(l/2 — e) 

((l/2)(l/2 - e) + 172^;) £P(Z) 

< (6 + 3 )ep(i) 

= 9 ep(i) 
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It follows that \p'i(i) — Pi{i)\ < 9 ep(i), and therefore that ||pi— P1W2 < 9 e-^/^ |p(i )| 2 < 9e. 
The proof for p 0 an d Pg follows similarly. □ 

Our main strategy for reducing the number of distinct signals used by our protocol is 
to choose a dense set S of signals and replace each signal in our protocol with a nearby 
close signal in S. The following lemma bounds the additional information leaked by this 
replacement procedure. 

Lemma 3.15. Let a be a signal in protocol n that shifts belief p to (po^Pi)- Write a as the 
convex combination XT=i w i a ^ °f n signals crW (with all Wi G [0,1] and = 1). Let q a 
be the probability that signal a is sent as part of protocol n (i.ethe probability we reach the 
corresponding node of-it), and let 1 q be the protocol obtained by replacing signal a with signal 
a 19. Then for some i, 


IC^TTi) < IC P {7 r) + q a 



C (a (l) ,p) 1 -C(a,p) 


Proof. Without loss of generality, let us assume that Alice is sending signal er. Let us consider 
two possibilities for Alice’s action when she is about to send signal a. 

In the first case, she chooses a signal randomly with probability Wi and sends that 
signal. This is equivalent to just sending signal a (in particular, the probability we send a 1 
is 'f2 i w i aa ) = <J a ), and altogether, this is equivalent to executing the original protocol. 

I 11 the second case, she chooses a signal cr^ randomly with probability and sends that 
signal, along with the index i that she chose. This is equivalent to choosing a protocol 7 q 
randomly with probability pi and executing that protocol (in particular, she can choose the 
index i at the beginning of the protocol). Note that, since 7r is a zero-error protocol that 
computes /, 7 q must also compute / with zero-error, so 7Tj is also a valid protocol for this 
problem. 

Let K be the random variable corresponding to the index that Alice chooses (if signal 
cr is never sent, then K = —1), and as before, let II be the random variable corresponding 
to the transcript of 1 r. The total information Alice reveals to Bob in the first case is then 
/(II; A\B), and the (expected) total information Alice reveals to Bob in the second case is 
then I(YIK-A\B). 

Divide II into two parts; n pre , which contains the transcript of 7 r up to and including the 
transmission of a, and TLf in , which contains the remainder of the transcript after the index 
K is revealed. Note that, by Lemma [2.131 


I(U P reK ; A\B) - /(n pre ; A\B) = q a 



C(a {l \p) ] -C(a,p) 


Since K and II f in are conditionally independent given A, B and Ti pre , i.e. /(//; II f in \H pre AB) 
0, by Lemma 1X31 it follows that 


17 





I(UK]A\B) - I(U]A\B) 


< 


I(K-,A\UB) 

/(A; A\U pre BU fin ) 
I(K; A\U pre B) 


I(Apre.K ; A\B) - /(n pre ; yf|£?) 

{(its ™* 0 (°‘ w »p)) 


. 2=1 


Since E [7(7^ (7T*)] — IC p (ir) = I(AK; A\B) — /(II; A\B), the result follows. 


□ 


Finally, we use the continuity properties of the cost function C to effectively bound the 
quantities in Lemma 13.151 

Lemma 3.16. Let f : — y M be a function that is smooth on a convex compact subset R 

o/R a . Let x be a point in R. Let x\,... ,Xk E R and w±,... ,Wk G [0,1] satisfy JT WiXi = x, 
— 1; an d ||^ — Xi|| < e for all i (here || • || is the standard euclidean norm). Then 


where 


k 

f{x ) -^WifiXi) 
2=1 


< Ue 2 


U = max \ Xmax(D 2 f(z))\ 

zGR 

where A max (M) is the largest eigenvalue (by absolute value) of M, and D 2 f(a ) is the Hessian 
of f at a. 

Proof. For each i, let = x* — x. By the Taylor expansion of / (with the mean-value form 
of the remainder), we know that for any x in R, 

f(x + v) = f(x) + v t Df(x ) + v* D 2 f(y)v 

for some y on the line segment connecting x and x + v. Since = 1 and w % v i = 0; h 
follows that 


k 


k 


^2wif(xi) = f(x) + ^w i v t i D 2 f(y i )v i 


for some % on the line segment connecting x and x*. Since WdMiA < |A maa .(M)| • I In 
\v\D 2 f{y)vi\ < Ue 2 for all i. It follows that 


and therefore that 


k 

^WivlD 2 f(yt)vi 

i=l 


k 

< ^2 w i\v t i D 2 f(y i )v i \ <Ue 2 
2=1 
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□ 


k 

f ( x ) -^WifiXi) 
1=1 


< Ue 2 


It is straightforward to verify that the cost function C(p) is smooth over the region i ? 7 
given by p(i,j ) > 7 and thus satisfies the condition of Lemma 13.161 . Moreover, we can 
compute explicit upper bounds for the constant for this function. We compute one such 
bound below. 

Lemma 3.17. Let be the subset = W N defined by p(a,b) G [ 7 ,1] for all a G A 

and b G B (in particular, we do not have the constraint that ^2 a bP(ci,b) = 1). Then if 

Ury max \X m ax(D C (//) j 

we have that U 1 < ( 2 /ln 2 ) 7 _1 . 

Proof. Recall (equation 12.31) that C(p) = E a ^ p [D(p(b\a)\\p(b\a))] + E fe ^ p [Zl(p(a|&)||/i(a|fr))]. 
Write 

Ca(p) = E a ~ p [D(p(b\a)\\p,(b\a))} 

C B (p) = E br ^ p [D(p(a\b)\\p(a\b))] 


Note that we can write 


C A (p) = E a ^ p [D(p(b\a)\\p(b\a))] 

= ^P(a)D(p(b\a)\\p(b\a)) 

= 5 p(a)p(,,|a)log i W) 

= ^P(a,b) (log p(a,b) - logp(a) - log p(a,b) + log/i(a)) 

a,b 

P(a,b) log p(a,b) ) - l^p(a)logp(a 

a, b / \ a 

p(a,b) logp(a,b)\ + I ^p(a) log/i(c 

a ,b / \ a 


Let D a b stand for dp ® a b ^ . Then, it follows that (over p G R y ): 
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D a ,bCA 

= 

logp(a, b ) — logp(a) — log /i(a, b ) + log /i(a 

Dl b C A 


1 ( 1 M 


In 2 \p(a, b) p(a) J 

\D 2 a , b C A | 

< 

( In2)- 1 T - 1 

Da,b'D a , b C A 


1 


p(a ) In 2 

Da,b’Da,bC A \ 

< 

(ln2) _1 7 _1 

D a’ , b D a, b C A 

= 

0 

D a ' ,b' D a bC a 

= 

0 


with similar equations for Cb- It follows that the maximum entry (by absolute value) of 
D 2 C(p ) for p in the region i ? 7 is bounded above by ( 2 /ln 2 ) 7 -1 . Since the largest eigenvalue 
of a matrix is bounded above by the largest entry in the matrix, this implies the desired 
bound on f/ 7 . 

□ 

We can now proceed to prove Theorem 13.131 

Proof of Theorem \3.13[ Let U 1 = ( 2 /ln 2 ) 7 _1 , and set M = \J • Let S be the set of 

signals where each a* is of the form | + jjki, for some integer /c* between —7 -1 M and 7 _1 M. 
Let a be a non-revealer signal sent at node x of protocol 7 r, and let p p x be the belief 
conditioned on the protocol reaching the node x. Then, since a is balanced and has power 
at least S, it shifts belief p to (p — v,p + v ), for some v e R^ with ||n|| > 5. 

Since our signal a has size at most 7 _ 1 5, it is contained within a ‘hypercube’ in the 
space of signals whose vertices belong to S. It follows that we can write cr as the convex 
combination Yl‘k=i w k a ^ k ' > °f 2 N signals cr^ in S such that Icq — af ) < jj for all i and k (in 
fact, it can be written as the convex combination of N of these signals, but this does not 

improve our resulting bound). It follows from Lemma 13. 1 41 that if a ^ shifts p to 5 

then Wpo-Pq^I < f and \\pi - p[ k) \\ < f. 

For ease of notation, let 

E{a) = (u (fc) ,p) j -C(a,p) 

Then, by Lemma [3.161 we have that 
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E(a) = P 0 Y w k C (p ( Q 


k =1 



y WkC 


< 


k =1 


siuy. „ „. 

-m^( Po + p ^ 

81R 7 5 2 

M 2 



(Since a is balanced, it is in fact the case that Pq — Pi — |, but we only need that Pq + Pi = 1 
above). Now, let V(p ) = ||p|||, and let V(a,p) = \(V{jp — v) + V(p + v)) — V(p). Note that 


V(<r,p) 


= \(\\P ~ v \\ 2 + \\p + v\\ 2 ) 

> S 2 


\\P\ 


It follows that 


E(a) 81£/ 7 
V(a,p) ~ ~M 2 


(3.12) 


By Lemma [3.151 there exists some k such that replacing a with increases the infor¬ 
mation cost by at most q a E(a). Repeatedly performing this procedure, we can replace all 
of the signals in n with signals in S while increasing the information cost by at most 


y^E{a) 

CTE7T 


CrG7r 

= eyq a V(a,p) 

CTG7T 


where the inequality follows by equation 13.121 Since (by the same logic as that in Lemma 

M) 


yqaV(a 1 p)= y g„|H| 2 <l 

crS7r leaf nodes v 

it follows that our new protocol has information cost at most 7C' /i (7r) + e. In addition, since 
there are only \A\ + \B\ distinct revealer signals, the total number of distinct signals in our 
new protocol is at most |>S| + (|*A| + |£>|) = (2y ~ l M) N + (|*A| + \B\ ) = Q , as desired. □ 
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3.5 Using a bounded number of alternations 

Finally, we show that we can convert a protocol for / that uses a bounded number of distinct 
signals (yet arbitrarily many of them) into a protocol for / that, while leaking at most e 
extra information, uses a bounded number of alternations (steps in the protocol where Alice 
stops talking and Bob starts talking, or vice versa). 

We achieve this by ‘bundling’ signals of the same type together; that is, at a point in the 
protocol where Alice would send Bob a certain signal, she may instead send him a bundle 
of t signals. Then, the next t — 1 times Alice would send Bob this signal, Bob instead refers 
to the next unused signal in the bundle. If there are unused signals in a bundle, this may 
increase the information cost of the protocol; however, by choosing the size of the bundle 
cleverly, we can bound the size of this increase. 

Definition 3.18. Let n be a communication protocol and let Vi,v 2 ,... ,Vk be one possible 
computation path for n. An alternation in this computation path is an index i where the 
signals at u, and n i+ 1 are sent by different players. The number of alternations in n is the 
maximum number of alternations over all computation paths of ir. 


Theorem 3.19. Let n be a communication protocol with information cost C that only uses Q 
distinct signals. Then, for any e > 0, there exists a communication protocol n' that computes 
the same function as n with information cost at most C + 2e but that uses at most 


ty= /2Q'ogiV + 

alternations. 


Proof. Label our Q different signals through We will reduce the number of alter¬ 
nations in 7T by bundling signals of the same type in large groups. That is, if Alice (at a 
specific point in the protocol) would send Bob signal <jW, she instead sends Bob t copies of 
signal i (for an appropriately chosen t ). Then, the next t — 1 times in the protocol that Alice 
would send Bob signal a^\ Bob instead refers to one of the unused t copies Alice originally 
sent. Once these t copies are depleted and protocol calls for a (t + l)st copy, the process 
repeats and Alice sends a new bundle to Bob (possibly with a different value for t). 

We choose t as follows. Without loss of generality, assume Alice is sending a bundle of sig¬ 
nals <7 to Bob. Let U pre be the transcript of the protocol thus far. Let X t = (Xi, X 2 ,..., X t ) 
be a random variable corresponding to t independently generated outputs of a. We consider 
three cases: 


• Case 1: It is the case that 


I^X^UpreB) > j 

In this case we set t — 1 (note that this is equivalent to simply following the original 
protocol). 
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• Case 2: There exists a positive to such that 

In this case, we set t = t 0 . 

• Case 3: For all positive t, 

/(A;X*|lI pre 5)<^ 

In this case, we set t to be the maximum number of times signal o is ever sent in 
protocol tv. 

The remainder of this proof is divided into three parts. In the first part, we argue that 
the three cases above are comprehensive. In the second part, we argue that the information 
cost of this new protocol is at most C + e. Finally, in the third part we argue that this 
bundling process decreases the total number of alternations to at most W. 

Cases are comprehensive We first argue that the three above cases indeed encompass all 
possibilities. In particular, the function I (A] X f \U pre B) is non-decreasing in t, so it suffices 
to show that there does not exist a t for which 

/(H;X f |n pre H) < ^ < I (A] X t+1 \U pre B) 

To show this, we claim that I (A] X t+1 \U pre B) — I(A; X t \Il pre B) is (weakly) decreasing 
in t. This follows from the following chain of inequalities: 


I{A-X t+l \Yi pre B) - I(A- dC|II preJ B) = 

< 


I (A] X t+ i\U pre B X*) (3.13) 

IiA-Xt+^UpreBX^ 1 ) (3.14) 

IiA-XtlUp^BX^ 1 ) (3.15) 

IiA-X^Up^B) - IiA-X^lUpreB) (3.16) 


Here, inequality [3H4] follows from noting that J(A" t+1 ; X t \Yi pTe ABX t ~ 1 ) = 0 (indeed, X t+ i 
and X t are conditionally independent given A and n pre ) and applying Lemma [2.51 

Now, note that if I(A] X 1 \Tl pre B) is greater than we are in either case 1 or case 2. 
Therefore, assume that I(A; X 1 \U pre B) < ^. Since J(H; A 1 |n pre i?) — I(A] X°\Yl pre B) = 
I(A; X 1 \Tl pre B), we have that /(H; X t+1 \Tl pre B) — I (A] X t \A pre B) < ^. It follows that it is 
impossible for I(A; X t \Il pre B) to be less than ^ while J(H; X t+] \U pre B) is larger than 
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Information leakage is small Let II be a random variable corresponding to the transcript 
of onr old protocol, and let II' be a random variable corresponding to the transcript of onr 
new protocol. Note that the only difference between II' and II is that II' contains some 
excess signals in the form of unfinished bundles. 

For each i, let Ri be the random variable corresponding to the excess signals of type cr^ 
(in particular, Ri is of the form (X u+1 ,... ,X t ) if u out of the t signals in this bundle were 
used). For 1 < t < Q, let R l = (i?i, R 2: ..., R t ). We can then write II' = UR Q , from which 
it follows 

I(A-,n'\B) = I(A]UR q \B ) 

Q 

= I(A]U\B) + ^I(A,R i \I[BR i - 1 ) 

i —1 

Q 

< I{A-Ii\B) + ^I{A-R i \IVB) 

1=1 

The last inequality follows from observing that /(i?p R 1 ~ 1 \I1AB) = 0 and applying 
Lemma 12.51 

We would now like to show that, for each i, I(A; Ri\UB) - Q' To do this, we will define 
Yi to be the random variable given by 

Yi (Z 2 , ■ • • i Z u , X u _^_\ : X u _^_ 2 , ■ ■ • j Xt) 

Here, X u+ \ through X t are the elements of Ri (the unused signals in bundle i), and Z 2 
through Z u are independently sampled Bernoulli random variables with probability cr^(A) 
(that is, individually, they are distributed identically to each individual X t yet independent 
from n and X f given A). The motivation behind this construction is to pad Ri with addi¬ 
tional elements (identically distributed to, yet independent from the X,) as to avoid revealing 
information about the number of unused signals in the bundle (which itself is a random 
variable that might reveal information about dor 5). 

Define the random variable U to equal t — \R. % \. To begin, note that since the first signal in 
a bundle is always used, we can always recover R, t given Y % and U (in particular, R t is a suffix 
of Yi of length t — U), so I(A;Ri\ BB) < I(A;UYi\UB) = I(A-Yi\UB) + I(A;U\ILBYJ. Since 
U is recoverable given n, I{A\ U\ABYi) = 0, and therefore I{A\ Ri\UB) < I(A; Y^UB). 

Next, let H pre be the prehx of n up to the point where the last bundle for was 
created, and let n f m be the remainder of the transcript (so n = H pre Tlfi n ). We claim that 
I (A', Yi\TlB) < I (A ; Y i \TL pre B) . Again, this follows from observing that I (Y t ] U fi n \H pre AB) = 
0 and applying Lemma 12.51 In particular, conditioned on 1 l /)rfi , A, and B , Wf in is simply 
some (randomized) function of X 1 through X u , and hence independent from Y t . 

Finally, if this bundle is a Case 1 bundle, then t = 1 and Y % is empty, so /(A; Yi\Ii pre B) = 
0. Otherwise, note that conditioned on B and n pre , Y t is distributed identically to X^ 2 ^ = 
(X 2 , X 3 ,..., Xt) (in particular, again they are both just t —1 independent copies of a Bernoulli 
random variable with probability cr^(A)). Since X 1 is a superset of X' [2,t \ it follows that 
/(A; Yi\Yl pre B) = /(A; A"[ 2,f l|n pre i?) < /(A; \Yl pre B) which is at most 
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If Qa of the Q distinct types of signal are sent by Alice, it follows from this argument 
that /(A; IT|.B) < /(A; II|B) + Since a similar argument establishes that /(£>; IT|A) < 

/(£>; II | A) + Q ~Q A e, it immediately follows that + e. 

Number of alternations is small Since alternations only occur between bundles, to show 
that the number of alternations is at most W, it suffices to show that the number of bundles 
sent in an execution of n 1 is at most W. To do this, we will modify protocol 7r' by aborting 
after the W th bundle is sent and forcing Alice and Bob to exchange their inputs at this 
point. We will show that the probability the protocol n' uses at least W bundles is at most 
since the information cost of any protocol is bounded above by log N, this results in 

an increase of at most log A" = e in the information cost of our protocol. Combining 

this with the previous section of the proof, this results in a protocol with information cost 
at most + 2 e. 

Let Mi be the Ah bundle sent in the protocol, and let M l = (Ml, M 2 ,..., M,.) be the list 
of the first i bundles sent in the protocol. Note that 


^ (/(A; M i+ i\M i B) + /(£; M i+ 1 |ML4)) (3.17) 

i 

= ^(J(A;M m |B)-/(A;M i |B) + /(B;M i+ 1 |A)-/(B;M i |A)) (3.18) 

i 

= /(A; W\B) + I(B; IT| A) (3.19) 

= (3.20) 

< log N (3.21) 

Let Pi be the probability that at least i bundles are sent under ir, and let p i<3 be the 
probability that the Ah bundle sent is a Case 3 bundle. Then, since each non-Case 3 bundle 
contributes at least ^ to the information cost of the protocol, 

I(A;M i+1 \M i B) + I(B ] M i+1 \M i A) > {p i - Pifl )±- (3.22) 


Summing equation 13.221 over all i (and combining with inequality 13.2 1 p , we have that 

log N > I(A;W\B) + I(B;W\A) 

> ^ (/(A; Mj +1 |M*B) + !{B\ M i+1 |M*A)) 



Note that J2iPi ,3 is the expected number of Case 3 bundles sent. Since we send at most 
one Case 3 bundle of each type, this sum is at most Q. It follows that 

\ ^ 2 QlogA' 

}_^Pi < —-— + Q 
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Finally, since the p t are non-increasing, the probability we send at least W bundles is at 
most 

Pw - W^ Pi ~ log N 

as desired. 

□ 


3.6 Computing Information Complexity 

Combining the results of Theorems 13.11 13.41 13.81 13.131 and 13.191 we obtain the following 
result. 

Theorem 3.20. Let n be a communication protocol with information cost C that successfully 
computes function f over inputs drawn from distribution p over Ax B. Then there exists a 
protocol 7 r' with information cost at most C + e that also successfully computes f over inputs 
drawn from p, but that uses at most w(f , e) alternations where 

w{f,e) = {Ne~ 1 )°^ (3.23) 

where N — \A x B\. 

Definition 3.21. Let the VF-alternation information cost of /, ICw,/i{f), equal mi n IC^n), 
where the infimum is taken over all protocols 7r that successfully compute / that use at most 
W alternations. 

Corollary 3.22. We have that 

ic,(f)<ic w{M M) < 10 ^ + 6 

The following result of Braverman and Rao provides a link between the communication 
complexity and information complexity of protocols restricted to at most W alternations. 

Lemma 3.23. Let n be a protocol with information cost I that uses at most W alternations. 
Then, for every e > 0, there exists a protocol r such that 

i) with probability at least 1 — e, at the end of protocol r, Alice and Bob output a valid 
transcript for tt (distributed identically to it (A, B)). 

ii) the communication cost of r is at most I + Of\/WI + W) + 2VFlog(VF/e). 

Proof. See Corollary 2.2 of jT] . □ 

Define JC M (/, e) to be equal to inR IC^tp r), where the infimum is taken over all protocols 
tt that successfully compute f with probability at least 1 — e. The following lemma relates 
IC,(f,e) to IC,(f). 
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Lemma 3.24. For all e G (0,p 8 ) (where as before, p = mm ab p(a,b)), 

( ( 2Ne 1/4 \ jVe 1/4 \ 

IC,(f) < IC,(f, e) + 2Ui (l -—J + 2 (log N + 2)—— ) 

Proof. See Lemma 6.3 in [5]. □ 

Likewise, define CC^{f,e) to be equal to inU C'C'(7r), where the inhmum is taken over 
all protocols n that successfully compute / with probability at least 1 — e (when inputs are 
drawn from distribution p). The following theorem relates IC^(f,e) to CC^f, e). 

Theorem 3.25. Let 

( ( 2iVe 1/4 \ iVe 1/4 \ 

W, e) = 2 (h(l- — — J + 2(logiV + 2)—-— J 

and 


U»{f, e) = e + 0 ^w(f,e)(IC ti (f) + e) + w(f, e)^ + 2 w(f, e) log 
We have that 


ICM - L,(f, e) < CC,(f, e) < IC,(f) + U,(f, e) 

Proof. To prove the first two inequalities, first note that /C M (/, e) < CC^f, e); this follows 
from the fact that for any protocol 7r, IC^n) < CC(n). Then, by Lemma [3.241 JC M (/) < 
JC M (/, e) + L^(/, e), from which the leftmost inequality follows. 

To prove the right inequality, set W = w(f,e) in Lemma [3.231 and apply the fact that 

ic w{f , e) M)<ic»(f) + e. ^ □ 

Theorem 13.251 shows that if we can compute the e-error communication complexity of 
/, we can approximate the information complexity of / to within an additive factor of 
max(L At (/, e), f/ M (/, e)). Unfortunately, while we can make L^(/, e) arbitrarily small by de¬ 
creasing e, U^f, e ) may be large. To remedy this, we apply the following direct sum results. 

Lemma 3.26. We have that 

IC,(f n ,e) = nIC»(f,e) 

In particular, for e = 0, 

IC,(f n ) = nIC,{f) 

Proof. See Theorem 4.3 in [3]. □ 

Lemma 3.27. We have that 


ic, 




(r 




(/) 


n 
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Proof. Let tt be the protocol that computes / which requires w(f,e) rounds and has infor¬ 
mation cost IC w ^f >e ) >tl (f). By running n copies of tt in parallel, we can construct a protocol 
for f n which still requires only w(f,e) rounds and has information cost nIC w The 
result follows. □ 

Corollary 3.28. We have that 

CC.iP, e) < ICPP) + U n pf, e) (3.24) 

where 


UnM > e) = ne + O ^ Jn-w{f , e)(IC ll (f) + e) + w(f, e)^ + 2w(f, e) log ( 3 - 25 ) 

Proof. We follow the proof of the upper bound in Theorem 13.251 but instead of setting 
W = w(f n ,e) in Lemma [3.231 we set W = w(f,e). Then, by Lemma [3.271 IC w (f tC )Pf n ) < 
nIC w (f >e ) tfl (f) < n(ICfj,(f) + e). Applying this fact, we obtain our desired result. □ 

Corollary 3.29. For any n > 1, 

icm) - M/. f> < tFAPA < ic „(/) + LAM 

n n 

Proof. To prove the lower bound, recall that /C' M (/ ra ,e) < CC M (/ n ,e). Dividing by n and 
applying the result of Lemma 13.261 we obtain that I (/, e) < - ( — - 1 . Applying the lower 
bound from Theorem 13.251 we have that IC^f) — L^f.e) < JC M (/, e), and hence that 

CC (f" n pi 

IC,(f) - L m (/, e) < (3.26) 

To show the upper bound, simply divide both sides of Corollary 13.281 by n and apply 
Lemma 13.261 □ 

We can now proceed to prove our main theorem. 

Proof of Theorem M.il Fix an a > 0; we will show how to approximate the information 
complexity of / to within an additive factor of a. First, note that since L M (/, e) is decreasing 
in e, we can choose e small enough so that both L M (/, e) < a and e < |. Secondly, note that 


UnAf, e) < ne + y/nU^f, e) 

(3.27) 

It follows from Corollary 13.291 that 


n y/n 

(3.28) 


If we choose n large enough so that Uf f, then it follows from Corollary 13.291 that 
CC^(f n ,e)/n approximates JC M (/) to within an additive factor of a. 

Note that n log N is an upper bound on CC^(f n , e). We can therefore compute CC M (/ n , e) 
simply by enumerating all protocols of depth at most nlogiV, checking which protocols 































compute / successfully at least (1 — e) proportion of the time, and taking the minimal 
communication cost of such protocols. This completes the proof that information complexity 
is computable. 

To obtain explicit asymptotic bounds on n, note that to ensure L^f, e ) < a/2, it suffices 
to take e = O (a 8 iV _4 p 4 ) = O (a 16 iV -8 ) (by the proof of Theorem 13.11 we know that we can 
ensure p = (/(a 2 ^^ 1 )). For this value of e, in order to choose n so that U ^ n < a/2, it 
suffices to take 


n = 0(w{e)\ogw{e)/a 2 ) = (Na 

and hence it suffices to enumerate protocols with up to a maximum depth d on the order of 
(IVa -1 ) 0 ^. The number of such protocols is at most 

2 N2 d _ 2exp((ATQ- 1 )° (JV) ) 

Since each protocol with depth d can be checked for correctness in time O(Nd) (by 
checking all possible N pairs of inputs), this is also a bound on the time complexity of this 
algorithm, as desired. □ 

Remark 3.30. The techniques in this section do not immediately extend to the case of ex¬ 
ternal information complexity (in particular, no direct sum statement analogous to Lemma 
13.261 is known for external information complexity). Instead, to prove the analogue of Theo¬ 
rem [ITT] for external information complexity, we can proceed from Theorem I3.20l bv applying 
the results of Ma and Ishtar to approximate arbitrarily closely the IF-alternation external 
information complexity of / (see section II.B of [14]). 
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